NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Uncovering Discrimination Clusters: Quantifying and Explaining Systematic Fairness Violations

Akash, Ranit D; Kumar, Ashish; Monjezi, Verya; Trivedi, Ashutosh; Tan, Gang; Tizpaz-Niari, Saeid (November 2025, 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE))

Full Text Available
An LLM Agentic Approach for Legal-Critical Software: A Case Study for Tax Prep Software

https://doi.org/10.1145/3744916.3764575

Gogani-Khiabani, Sina; Trivedi, Ashutosh; Saha, Diptikalyan; Tizpaz-Niari, Saeid (September 2025, The International Conference on Software Engineering (ICSE'26))

Full Text Available
Performance of LLMs on VITA test: potential for AI-assisted tax returns for low income taxpayers

https://doi.org/10.1007/s10506-025-09465-7

Gogani-Khiabani, Sina; Trivedi, Ashutosh; Chyi, ShinPing; Tizpaz-Niari, Saeid (July 2025, Artificial Intelligence and Law)

This paper investigates the performance of a diverse set of large language models (LLMs) including leading closed-source (GPT-4, GPT-4o mini, Claude 3.5 Haiku) and open-source (Llama 3.1 70B, Llama 3.1 8B) models, alongside the earlier GPT-3.5 within the context of U.S. tax resolutions. AI-driven solutions like these have made substantial inroads into legal-critical systems with significant socio-economic implications. However, their accuracy and reliability have not been assessed in some legal domains, such as tax. Using the Volunteer Income Tax Assistance (VITA) certification tests—endorsed by the US Internal Revenue Service (IRS) for tax volunteering—this study compares these LLMs to evaluate their potential utility in assisting both tax volunteers as well as taxpayers, particularly those with low and moderate income. Since the answers to these questions are not publicly available, we first analyze 130 questions with the tax domain experts and develop the ground truths for each question. We then benchmarked these diverse LLMs against the ground truths using both the original VITA questions and syntactically perturbed versions (a total of 390 questions) to assess genuine understanding versus memorization/hallucinations. Our comparative analysis reveals distinct performance differences: closed-source models (GPT-4, Claude 3.5 Haiku, GPT-4o mini) generally demonstrated higher accuracy and robustness compared to GPT-3.5 and the open-source Llama models. For instance, on basic multiple-choice questions, top models like GPT-4 and Claude 3.5 Haiku achieved 83.33% accuracy, surpassing GPT-3.5 (54.17%) and the open-source Llama 3.1 8B (50.00%). These findings generally hold across both original and perturbed questions. However, the paper acknowledges that these developments are initial indicators, and further research is necessary to fully understand the implications of deploying LLMs in this domain. A critical limitation observed across all evaluated models was significant difficulty with open-ended questions, which require accurate numerical calculation and application of tax rules. We hope that this paper provides a means and a standard to evaluate the efficacy of current and future LLMs in the tax domain.
more » « less
Full Text Available
AI-Enabled Tax Assistance for Low/Moderate Income Taxpayers: An Evaluation of RAG-based LLMs for VITA Volunteer Support

Gogani-Khiabani, Sina; Buddhi, Rohan S; Dabral, Yogesh; Chyi, ShinPing; Trivedi, Ashutosh; Tizpaz-Niari, Saeid (June 2025, 15th Annual IRS/TPC Joint Research Conference on Tax Administration (IRS-TPC 2025))

Full Text Available
Experts, commercial software, and the internal revenue service: American taxpayer perceptions of trust and procedural justice.

https://doi.org/10.1037/lhb0000600

Reed, Krystia; Wagner, Morgan; Tizpaz-Niari, Saeid; Trivedi, Ashutosh (June 2025, Law and Human Behavior)

Full Text Available
Uncovering Discrimination Clusters: Quantifying and Explaining Systematic Fairness Violations

https://doi.org/10.1109/ASE63991.2025.00141

Akash, Ranit D; Kumar, Ashish; Monjezi, Verya; Trivedi, Ashutosh; Tan, Gang; Tizpaz-Niari, Saeid (November 2025, IEEE)

Full Text Available
Data Science in a Mathematics Classroom: Lessons on AI Fairness

https://doi.org/10.1145/3686852.3686889

Sotelo, Berenice; Wieseman, Kirsten; Tizpaz-Niari, Saeid (October 2024, ACM)

Full Text Available
Timing Side-Channel Mitigation via Automated Program Repair

https://doi.org/10.1145/3678169

Ruan, Haifeng; Noller, Yannic; Tizpaz-Niari, Saeid; Chattopadhyay, Sudipta; Roychoudhury, Abhik (November 2024, ACM Transactions on Software Engineering and Methodology)

Side-channel vulnerability detection has gained prominence recently due to Spectre and Meltdown attacks. Techniques for side-channel detection range from fuzz testing to program analysis and program composition. Existing side-channel mitigation techniques repair the vulnerability at the IR/binary level or use runtime monitoring solutions. In both cases, the source code itself is not modified, can evolve while keeping the vulnerability, and the developer would get no feedback on how to develop secure applications in the first place. Thus, these solutions do not help the developer understand the side-channel risks in her code and do not provide guidance to avoid code patterns with side-channel risks. In this article, we presentPendulum, the first approach for automatically locating and repairing side-channel vulnerabilities in the source code, specifically for timing side channels. Our approach uses a quantitative estimation of found vulnerabilities to guide the fix localization, which goes hand-in-hand with a pattern-guided repair. Our evaluation shows thatPendulumcan repair a large number of side-channel vulnerabilities in real-world applications. Overall, our approach integrates vulnerability detection, quantization, localization, and repair into one unified process. This also enhances the possibility of our side-channel mitigation approach being adopted into programmingenvironments.
more » « less
Full Text Available
Fairness Testing Through Extreme Value Theory

https://doi.org/10.1109/ICSE55347.2025.00070

Monjezi, Verya; Trivedi, Ashutosh; Kreinovich, Vladik; Tizpaz-Niari, Saeid (April 2025, IEEE)

Data-driven software is increasingly being used as a critical component of automated decision-support systems. Since this class of software learns its logic from historical data, it can encode or amplify discriminatory practices. Previous research on algorithmic fairness has focused on improving “average-case” fairness. On the other hand, fairness at the extreme ends of the spectrum, which often signifies lasting and impactful shifts in societal attitudes, has received significantly less emphasis. Leveraging the statistics of extreme value theory (EVT), we propose a novel fairness criterion called extreme counterfactual discrimination (ECD). This criterion estimates the worst-case amounts of disadvantage in outcomes for individuals solely based on their memberships in a protected group. Utilizing tools from search-based software engineering and generative AI, we present a randomized algorithm that samples a statistically significant set of points from the tail of ML outcome distributions even if the input dataset lacks a sufficient number of relevant samples. We conducted several experiments on four ML models (deep neural networks, logistic regression, and random forests) over 10 socially relevant tasks from the literature on algorithmic fairness. First, we evaluate the generative AI methods and find that they generate sufficient samples to infer valid EVT distribution in 95% of cases. Remarkably, we found that the prevalent bias mitigators reduce the average-case discrimination but increase the worst-case discrimination significantly in 35% of cases. We also observed that even the tail-aware mitigation algorithm—MiniMax-Fairness—increased the worst-case discrimination in 30% of cases. We propose a novel ECD-based mitigator that improves fairness in the tail in 90% of cases with no degradation of the average-case discrimination. We hope that the EVT framework serves as a robust tool for evaluating fairness in both average-case and worst-case discrimination.
more » « less
Full Text Available
NeuFair: Neural Network Fairness Repair with Dropout

https://doi.org/10.1145/3650212.3680380

Dasu, Vishnu Asutosh; Kumar, Ashish; Tizpaz-Niari, Saeid; Tan, Gang (September 2024, ACM)

Full Text Available

« Prev Next »

Search for: All records